Data and Methods

Raw data on PM2.5 has been provided by the German environmental protection agency (UBN). …

Data on total daily COVID-19 cases and deaths has been provided by the Robert-Koch-Institut. The data has been smoothed using a gaussian loess function with span 0.3 to reduce variation due to delayed reporting over the weekend and especcially on Sunday and Monday. The daily number of new infections has been computed from the daily variations of the total confirmed numbers.

Overview of the mean development of PM2.5 and COVID-19 cases in Germany

The following figure shows the country wide average of daily PM2.5 and new, smoothed COVID-19 cases. Black vertical dotted lines represent the start of the contact restrictions (March 14th) and shut down (March 17th).

Dependency between development of PM2.5 and COVID-19 cases in Germany

Some studies compare the development of atmospheric parameters and COVID-19 cases over the entire avialable time series. As both air quality and COVID-19 rates generally decrease after a shutdown event, at least parts of the identified correlations are likely caused by this external event, especially in regions with a strong influence of local activity on local air quality.

To focus on the corelated development of PM2.5 and COVID-19 cases during the early and exponential growing phase, we restrict the time series to the maximum incubation phase of 14 days prior the first reported infection and the turning point of the infection dynamics shortly after the absolute maximum. For the following figure, the time series has been restricted to this period and the daily new, smothed COVID-19 cases have been detrended using a poission regression.

Based on that, the cross-correlation between PM2.5 and detrended daily new COVID-19 cases shows that PM2.5 is both leading up to 6 days and lagging up to 9 days.

Since the dentrended time series is still rather non-stationary and to get a better idea of the time periods and date ranges related to certain time lags, a wavelet coherence analysis is performed with a loess smoother.

The analysis shows that for the period of 4 to 5 days, the PM2.5 time series is leading arround March 1 with up to 2 or 3 days (arrows towards right upward). The situation changes towards April 1st. Here the COVID-19 cases are taking over the lead, with a small time lag at the period of arround 4 days and a lag of about 2 or 3 days at a period of about 8 days.

Dynmiac time warp clusters

The following figure shows clusters with similar development of daily COVID-19 cases.

The average of daily PM2.5 and smoothed new COVID-19 cases along with their detrened series within each cluster is shown in the figure below.

Based on that, the cross-correlation between PM2.5 and detrended daily new COVID-19 cases per cluster is shown below.

Finaly, the following figures show the wavelet coherence analysis per cluster.

Explanatory power of PM2.5 for COVID-19 cases

The following gam model indicates the predictive power of PM2.5 for the German cases.

The following gam models indicate the predictive power of PM2.5 within the individual German cases. The internal cross validation error is based on a 10-fold leave region out cross-validation. For cluster 4, only a 2-fold cross-validation has been computed which equals the number of regions in the cluster.

## [[1]]
## Generalized Additive Model using Splines 
## 
## 3086 samples
##    1 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (10 reps) 
## Summary of sample sizes: 2758, 2804, 2725, 2945, 2616, 2757, ... 
## Resampling results across tuning parameters:
## 
##   select  RMSE      Rsquared   MAE     
##   FALSE   1.880131  0.1264625  1.423384
##    TRUE   1.879478  0.1270898  1.423963
## 
## Tuning parameter 'method' was held constant at a value of GCV.Cp
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were select = TRUE and method = GCV.Cp.
## 
## [[2]]
## Generalized Additive Model using Splines 
## 
## 94 samples
##  1 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (2 reps) 
## Summary of sample sizes: 47, 47 
## Resampling results across tuning parameters:
## 
##   select  RMSE      Rsquared   MAE     
##   FALSE   70.85031  0.2307651  60.00179
##    TRUE   71.11774  0.1988518  60.71489
## 
## Tuning parameter 'method' was held constant at a value of GCV.Cp
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were select = FALSE and method = GCV.Cp.
## 
## [[3]]
## Generalized Additive Model using Splines 
## 
## 1787 samples
##    1 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (10 reps) 
## Summary of sample sizes: 1599, 1693, 1599, 1739, 1458, 1552, ... 
## Resampling results across tuning parameters:
## 
##   select  RMSE      Rsquared   MAE     
##   FALSE   4.755539  0.2646324  3.595651
##    TRUE   4.756866  0.2642241  3.596454
## 
## Tuning parameter 'method' was held constant at a value of GCV.Cp
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were select = FALSE and method = GCV.Cp.
## 
## [[4]]
## Generalized Additive Model using Splines 
## 
## 423 samples
##   1 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (5 reps) 
## Summary of sample sizes: 329, 282, 376, 329, 376 
## Resampling results across tuning parameters:
## 
##   select  RMSE      Rsquared   MAE     
##   FALSE   20.64013  0.3100091  15.96805
##    TRUE   20.64690  0.3069364  15.94215
## 
## Tuning parameter 'method' was held constant at a value of GCV.Cp
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were select = FALSE and method = GCV.Cp.

Geographical overview of the clusters